Search CORE

1,147 research outputs found

The Problem with the Linpack Benchmark Matrix Generator

Author: Jack J. Dongarra
Jack J. Dongarra
Julien Langou
Julien Langou
Publication venue
Publication date: 01/01/2008
Field of study

We characterize the matrix sizes for which the Linpack Benchmark matrix generator constructs a matrix with identical columns

arXiv.org e-Print Archive

CiteSeerX

MIMS EPrints

The 30th Anniversary of the Supercomputing Conference: Bringing the Future Closer - Supercomputing History and the Immortality of Now

Author: Dongarra J.
Dongarra J.
Getov Vladimir
Getov Vladimir
Walsh K.
Walsh K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2018
Field of study

A panel of experts discusses historical reflections on the past 30 years of the Supercomputing (SC) conference, its leading role for the professional community and some exciting future challenges

Crossref

WestminsterResearch

Computing the Rank Profile Matrix

Author: Bourbaki N.
Dongarra J. J.
Grigor'ev D. Y.
Malaschonok G. I.
Storjohann A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/07/2015
Field of study

The row (resp. column) rank profile of a matrix describes the staircase shape of its row (resp. column) echelon form. In an ISSAC'13 paper, we proposed a recursive Gaussian elimination that can compute simultaneously the row and column rank profiles of a matrix as well as those of all of its leading sub-matrices, in the same time as state of the art Gaussian elimination algorithms. Here we first study the conditions making a Gaus-sian elimination algorithm reveal this information. Therefore, we propose the definition of a new matrix invariant, the rank profile matrix, summarizing all information on the row and column rank profiles of all the leading sub-matrices. We also explore the conditions for a Gaussian elimination algorithm to compute all or part of this invariant, through the corresponding PLUQ decomposition. As a consequence, we show that the classical iterative CUP decomposition algorithm can actually be adapted to compute the rank profile matrix. Used, in a Crout variant, as a base-case to our ISSAC'13 implementation, it delivers a significant improvement in efficiency. Second, the row (resp. column) echelon form of a matrix are usually computed via different dedicated triangular decompositions. We show here that, from some PLUQ decompositions, it is possible to recover the row and column echelon forms of a matrix and of any of its leading sub-matrices thanks to an elementary post-processing algorithm

arXiv.org e-Print Archive

HAL-ENS-LYON

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

On the Performance Prediction of BLAS-based Tensor Contractions

Author: CL Lawson
E Napoli Di
G Baumgartner
J C̆íz̆ek
JJ Dongarra
JJ Dongarra
L Lehner
LE Kidder
Q Lu
R Iakymchuk
RJ Bartlett
T Helgaker
Publication venue
Publication date: 30/09/2014
Field of study

Tensor operations are surging as the computational building blocks for a variety of scientific simulations and the development of high-performance kernels for such operations is known to be a challenging task. While for operations on one- and two-dimensional tensors there exist standardized interfaces and highly-optimized libraries (BLAS), for higher dimensional tensors neither standards nor highly-tuned implementations exist yet. In this paper, we consider contractions between two tensors of arbitrary dimensionality and take on the challenge of generating high-performance implementations by resorting to sequences of BLAS kernels. The approach consists in breaking the contraction down into operations that only involve matrices or vectors. Since in general there are many alternative ways of decomposing a contraction, we are able to methodically derive a large family of algorithms. The main contribution of this paper is a systematic methodology to accurately identify the fastest algorithms in the bunch, without executing them. The goal is instead accomplished with the help of a set of cache-aware micro-benchmarks for the underlying BLAS kernels. The predictions we construct from such benchmarks allow us to reliably single out the best-performing algorithms in a tiny fraction of the time taken by the direct execution of the algorithms.Comment: Submitted to PMBS1

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Developing numerical libraries in Java

Author: Boisvert Ronald F.
Dongarra Jack J.
Pozo Roldan
Remington Karin
Stewart G. W.
Publication venue
Publication date: 01/01/1998
Field of study

The rapid and widespread adoption of Java has created a demand for reliable and reusable mathematical software components to support the growing number of compute-intensive applications now under development, particularly in science and engineering. In this paper we address practical issues of the Java language and environment which have an effect on numerical library design and development. Benchmarks which illustrate the current levels of performance of key numerical kernels on a variety of Java platforms are presented. Finally, a strategy for the development of a fundamental numerical toolkit for Java is proposed and its current status is described.Comment: 11 pages. Revised version of paper presented to the 1998 ACM Conference on Java for High Performance Network Computing. To appear in Concurrency: Practice and Experienc

arXiv.org e-Print Archive

CiteSeerX

The University of Manchester - Institutional Repository

Message-passing performance of various computers

Author: Jack J. Dongarra
Tom Dunigan
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

Crossref

Message‐passing performance of various computers

Author: Jack J. Dongarra
Tom Dunigan
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Crossref

Improving the Accuracy of Computed Singular Values

Author: J. J. Dongarra
Wilkinson J. H.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Implementing Dense Linear Algebra Algorithms Using Multitasking on the CRAY X-MP-4 (or Approaching the Gigaflop)

Author: Jack J. Dongarra
Tom Hewitt
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Squeezing the most out of eigenvalue solvers on high-performance computers

Author: Dongarra Jack J.
Hammarling Sven
Kaufman Linda
Publication venue: Published by Elsevier Inc.
Publication date: 31/05/1986
Field of study

AbstractThis paper describes modifications to many of the standard algorithms used in computing eigenvalues and eigenvectors of matrices. These modifications can dramatically increase the performance of the underlying software on high-performance computers without resorting to assembler language, without significantly influencing the floating-point operation count, and without affecting the roundoff-error properties of the algorithms. The techniques are applied to a wide variety of algorithms and are beneficial in various architectural settings

Elsevier - Publisher Connector